library(tidyverse)library(markovchain) # for plotting state diagramlibrary(igraph)library(expm)library(viridis)library(viridisLite)library(ggpubr)library(grid)library(gridExtra)library(prismatic) # for auto-contrast colors (black/numbers number in transition matrix heat map)library(colourvalues) # for using with viridislibrary(kableExtra)
Code
set.seed(545)
Code
compute_stationary_distribution <-function(P){ s =nrow(P)rep(1, s) %*%solve(diag(s) - P +matrix(rep(1, s * s), ncol = s))}
This part concerns this Riddler Classic problem from FiveThirtyEight. Note: you can find solutions here, but you should try the problem on your own without looking at solutions.
You roll a fair six-sided die 6 times. Whatever the results, you paint those on the sides of a blank die. So, if your rolls were 4, 5, 2, 6, 1, 1, then your new die has no 3’s and two 1’s. Then, you repeat the process with your new die — roll it 6 times and paint the results on a blank die. Eventually, you’ll roll the same number 6 times, at which point the process stops. Let \(T\) = the number of rounds (of 6 rolls each) that you perform until stopping. (If your 6 rolls in the first round all result in the same number, then you stop with \(T=1\).)
1)
Code and run a simulation to approximate the distribution of \(T\) and its expected value, without using Markov chains. Summarize the approximate distribution in an appropriate plot, and describe the distribution in 2-3 sentences (e.g., compute and interpret a few percentiles).
We can code a simulation to use iteration rather than Markov chains. For a single repetition, we can keep “rolling” until we achieve the same number 6 times, and incrementing a counter each time we roll to simulate \(T\).
Code
# Start with a fair 6-sided diedice <-1:6# Initialize T to 0`T`<-0# Setup a table to track our progressdice_summary <-c(dice)# Repeat reroll process until all numbers are the same (aka variance 0)while (var(dice) !=0) {# Fill in new dice dice <-sample(dice, 6, replace=TRUE)# Increment T`T`<-`T`+1# Update tracking table dice_summary <-rbind(dice_summary, c(dice))}# Format and print tablecolnames(dice_summary) <-c("side1", "side2", "side3","side4", "side5", "side6")rownames(dice_summary) <-0:`T`as.data.frame(dice_summary)
Here, we can see it took \(T=\) 4 tries to get a sequence of all 2’s.
We can functionalize this repetition and repeat it a large number of times to get an approximate distribution for \(T\).
This approximate distribution is right skew with a range of [1, 53]. The mean is 9.6254, the standard deviation is 5.8356902, and the median is 8.
2)
Define a Markov chain that will help you find \(E(T)\). Be sure to clearly define the state space. (Note: there are several ways to do this; any one is fine just be clear in your choice.)
There are many ways to define a state space for \(T\). We can define one on:
Every combination or permutation of numbers on a dice: \(\{111111,111112,111113,...,666666\}\)
Every unique set of numbers on a dice: \(\{1,2,3,4,5,6,12,13,...,23456,123456\}\)
The number of unique sides on a dice: \(\{1,2,3,4,5,6\}\)
Etc.
The best state space will minimize the number of states while still preserving an appropriate amount of information. A good state space that we can use would therefore be a variation on the “every combinations” approach. In this approach:
Each state is an indexed list of 6 numbers.
The value at index \(i\) would represent the count of sides that contain the number \(i\).
Each value at \(i\in\{1,...,6\}\) would therefore be an integer in the range \([1,6]\).
The sum of the values of the list must be 6.
For example:
\(1,1,1,1,1,1\) would mean that all numbers 1 thru 6 are equally represented on the die.
\(2,3,0,0,0,1\) would mean that there are two sides labeled 1, three sides labeled 2, and 1 side labeled 6 on the die.
The full state space would look like the following, with each row being an individual state:
Determine the transition matrix for your Markov chain. You might want to compute a few of the transition probabilities by hand, but you’ll probably need to write code to fill in the whole matrix.
To find the transition probabilities, firstly we need to know which ones will be populated or not. We know the following:
If the current die/state does is not labeled with number \(i\) on any side, then the next die cannot have number \(i\). On the other hand, if the current die has number i, then the next die may or may not have state i.
In other words, the next state proceeding from the current state must have nonzero indices that are a subset of the nonzero indices of the current state.
Therefore, we can check this condition first, and then populate the matrix with the multinomial probability that we roll the next state with the die from the current state.
For example, say our current state is \(2,1,0,0,3,0\). This means that our current dice has two sides that are labeled 1, one side that is labeled 2, and three sides that are labeled 5.
Thus, the next state must have sides labeled some combination of \(\{1,2,5\}\).
The state \(0,2,1,1,1,1\) is therefore not a possible next state because this corresponds to the labels \(\{2,3,4,5,6\}\). It is not possible for sides to be labeled \(3,4,6\) since they cannot come from a die with only labels \(\{1,2,5\}\). The transition probability for \(p(2,1,0,0,3,0\to0,2,1,1,1,1)\) therefore equals 0
The state \(3,0,0,0,3,0\) is a possible next state because this corresponds to a die labeled with sides 1 and 5, which can come from the previous die since it has labels \(\{1,2,5\}\), which includes 1 and 5.
We can then calculate the multinomial transition probability \(p(2,1,0,0,3,0\to3,0,0,0,3,0)\) as follows:
Notice how the current state informs the probabilities of the multinomial distribution, while the next state informs the combination of the multinomial distribution.
Let’s apply this logic to the entire transition matrix.
Code
# Initialize PP <-matrix(nrow =nrow(state_space), ncol =nrow(state_space))# Define state names for Pstate_names <- state_space %>%unite(concat, everything(), sep="") %>% pull
Code
# Populate Pfor (i in1:nrow(state_space)) {# Current state x_current <-unlist(state_space[i,])# Valid die labels for next state x_current_check <-which(x_current !=0)for (j in1:nrow(state_space)) {# Next state x_next <-unlist(state_space[j,])# Next state die labels x_next_check <-which(x_next !=0)# Check to see if next die labels are subset of currentif (all(x_next_check %in% x_current_check)) {# If so, calculate multinomial probability P[i, j] <-dmultinom(x_next, prob = x_current /6) } else {# Otherwise, fill with 0 P[i, j] <-0 } }}
Use the transition matrix and tools from class to solve for \(E(T)\). Compare to the simulated value from part 1.
We can now use our transition matrix and treat this as a mean time to absorption (MTTA) problem. Notice how the states \(\{600000,060000,...,000006\}\) make an identity matrix. This is because they are absorbing states.
Therefore, \(E[T]=\mu_{111111}=9.656\) rounds. This is close to the simulated value from part 1.
5)
Use the transition matrix and tools from class to solve for the distribution of \(T\), and display the distribution in an appropriate plot. Compare to the simulated distribution from part 1.
Therefore, for each n, we can find \(P(T\le n)\) (the condition \(X_0=111111\) is implied) by summing over the transition probabilities from state \(111111\) to the absorbing states. For instance, if \(n=8\), we can calculate the probability below.
Code
sum((P %^%8)["111111", absorb_states])
[1] 0.5256652
The probability \(P(T=n)\) can therefore be found by the following property for discrete distributions:
We can overlay this plot onto our simulated values to see how closely they match.
Code
hist(`T`, probability =TRUE, ylim =c(0, 0.1))lines(T_vals, T_dist, col="red")points(T_vals, T_dist, col="red")
As seen in the plot, the distribution created from the transition matrix closely matches that of the simulated distribution.
Part 2
In this part you will create and solve your own problems involving Markov chains. You have lots of flexibility, but your problems should include at least one part that requires
A simulation
An analytical solution that uses either first step analysis/absorption
An analtyical solution that uses stationary distributions/long run behavior
Problem
A dataset I like to work with that is actually updated every few minutes is the the USGS earthquakes dataset, which lists earthquakes from the past 30 days.
Using this data, we can answer the following questions.
Construct a transition matrix for the magnitudes. Is this matrix irreducible?
Find the stationary distribution for the matrix, if it exists. Compare it to the distribution of magnitudes in the data.
Modify the transition matrix to one that can help us find the number of earthquakes until the next 5+ magnitude earthquake, \(T\).
Using the new transition matrix in part 3, find the mean time to absorption for each magnitude - that is, then mean number of earthquakes until the next 5+ magnitude earthquake, starting from each 4- magnitude state.
Using the transition matrix, simulate chains starting with the same magnitude as the observed data. Find \(E[T]\) and plot the distribution of T. Then overlay an exponential distribution with parameter equal to \(E[T]\). What do you notice?
Can this chain reasonably be considered a first-order Markov chain? Evaluate using both an appropriate visual and a formal test.
Make another visual to explore if occurrences are independent. Connect the results back to previous findings. Is the Markov property necessary to model this behavior?
1)
Construct a transition matrix for the magnitudes. Is this matrix irreducible?
We can use the markovchain package to get maximum likelihood estimates for each transition probability in the matrix.
This matrix appears to be irreducible, due to the presence of nonzero diagonal elements. All states are therefore recurrent. This becomes more apparent when viewing a state diagram.
Code
plot_state_diagram(P, state_names)
2)
In order to find the stationary distribution for the matrix, we can set up a system of equations to solve for it.
The stationary distribution is very similar to the empirical proportions in the data. This supports the theory that in the long-run, the distribution of states in an irreducible Markov chain is going to be the same as the stationary distribution.
3)
Modify the transition matrix to one that can help us find the number of earthquakes until the next 5+ magnitude earthquake, \(T\).
We can modify this matrix to help us find the number of earthquakes until the next 5+ magnitude earthquakes by turning magnitudes/states 5, 6, 7, 8, 9 and 10 into absorbing states. We can do this by replacing the appropriate transition probabilities with the identity matrix, as seen in the code below.
We can observe these changes in the state diagram:
Code
plot_state_diagram(P_absorb, state_names)
Notice how vertices 5, 6, and 7 in the state diagram have an outdegree of 0.
4)
Using the new transition matrix in part 3, find the mean time to absorption for each magnitude - that is, then mean number of earthquakes until the next 5+ magnitude earthquake, starting from each 4- magnitude state.
We can set up the following system of equations to solve for the mean steps until we reach an absorbing state, or the mean time to absorption (MTTA).
Interestingly, we find that for every transient magnitude/state in the chain, it takes on average 85-86 earthquakes to get to an absorbing magnitude/state. This makes sense given the structure of the graph, as generally the transient states are fully connected to each other in a clique.
5)
Using the transition matrix, simulate chains starting with the same magnitude as the observed data. Find \(E[T]\) and plot the distribution of T. Then overlay an exponential distribution with parameter equal to \(E[T]\). What do you notice?
Let’s start out with a single chain. First, let’s enumerate our starting distribution, \(\pi_0\). We want to 100% start with the same starting state as our observed data, so we want our starting distribution to reflect that.
Next, let’s generate states until we see a magnitude of 5 or above.
Code
# Sample from the starting distributionstate <-sample(state_names, 1, prob = pi_0)# Initialize indexi <-2# Initialize chainpath <-numeric()# Set starting statepath[[1]] <- state# Keep generating states until we encounter absorbing statewhile (!(state %in%5:10)) { state <-sample(state_names, 1, prob = P[as.character(path[[i -1]]),]) path[[i]] <- state i = i +1}cat(path)
Can this chain reasonably be considered a first-order Markov chain? Evaluate using both an appropriate visual and a formal test.
We can first construct a 3-dimensional crosstab of the previous, current, and next states, and then use this to plot a stacked bar chart. If the stacked bar charts are all level, then the sequence demonstrates the Markov property. If not, then the the sequence likely does not demonstrate the Markov property.
Code
x_summary <-table(lead(mags, 1), mags, lag(mags, 1)) %>%as.data.frame() %>%rename(x_previous ="Var1", x_current ="mags",x_next ="Var3", count ="Freq")x_summary %>%ggplot(aes(x = x_previous, y = count, fill = x_next)) +geom_bar(position ="fill",stat ="identity") +scale_fill_viridis_d() +labs(y ="Conditional probability given previous state") +facet_wrap(~x_current, labeller = label_both)
From the visual above, it looks unlikely that this chain actually follows the Markov property. Following up with a formal test:
\(H_0\): The sequence is generated from a first-order Markov chain.
\(H_A\): The null hypothesis is false.
Code
verifyMarkovProperty(mags)
Testing markovianity property on given data sequence
Chi - square statistic is: 451.4307
Degrees of freedom are: 165
And corresponding p-value is: 0
Because of the extremely low p-value, we have enough evidence to conclude that the sequence magnitudes of earthquakes is not plausibly a first-order Markov chain.
7)
Make another visual to explore if occurrences are independent. Connect the results back to previous findings. Is the Markov property necessary to model this behavior?
We can repeat the analysis above, but this time only inspecting the distribution of the next state conditioning on the current state.
Code
x_summary <-table(mags, lag(mags, 1)) %>%as.data.frame() %>%rename(x_current ="mags",x_next ="Var2", count ="Freq")x_summary %>%ggplot(aes(x = x_current, y = count, fill = x_next)) +geom_bar(position ="fill",stat ="identity") +scale_fill_viridis_d() +labs(y ="Conditional probability given previous state")
Here, we can see that the conditional distributions are more stable. It is more plausible that the different states in the sequence are fully independent of each other rather than demonstrating the conditional independence of the Markov property.
Tying this back to part 5, we notice that the exponential distribution is a good fit for \(T\). The exponential distribution applies to the waiting time between events that are both independent and homogenously distributed through time. This may have been a good hint the that the events are independent, but don’t demonstrate the Markov property. As a result, a Markov chain analysis might not be necessary or the best analysis to perform on this type of data context.